Discovering Knowledge through Multi-modal Association Rule Mining for Document Image Analysis
نویسندگان
چکیده
The paper introduces a descriptive data mining method to discover knowledge for the task of automatic categorization in document image analysis. We argue that a document image is a multi-modal unit of analysis whose semantics is deduced from a combination of textual content, layout structure and logical structure. So, the method considers simultaneously different modalities of document representation, and, therefore different types of information: spatial information derived from a complex document image analysis process (layout analysis), information extracted from the logical structure of the document (by means of document image classification and understanding) and the textual information extracted by means of an OCR. The proposed method is based on a relational data mining approach to discover association rules, where the relational setting is justified, given its appropriateness to analyze data available in more than one modality. Experimental results on a real world dataset are reported.
منابع مشابه
A Novel Method for Selecting the Supplier Based on Association Rule Mining
One of important problems in supply chains management is supplier selection. In a company, there are massive data from various departments so that extracting knowledge from the company’s data is too complicated. Many researchers have solved this problem by some methods like fuzzy set theory, goal programming, multi objective programming, the liner programming, mixed integer programming, analyti...
متن کاملA New Model for Discovering XML Association Rules from XML Documents
The inherent flexibilities of XML in both structure and semantics makes mining from XML data a complex task with more challenges compared to traditional association rule mining in relational databases. In this paper, we propose a new model for the effective extraction of generalized association rules form a XML document collection. We directly use frequent subtree mining techniques in the disco...
متن کاملMulti-level Association Rule Mining: an Object-oriented Approach Based on Dynamic Hierarchies
Previous studies in data mining have yielded e cient algorithms for discovering association rules. But it is well-known problem that the two controlling measures of support and con dence, when used as the sole de nition of relevant association rules, are too inclusive | interesting rules are included with many uninteresting cases. A typical approach to this problem is to augment the thresholds ...
متن کاملA Survey on Infrequent Weighted Itemset Mining Approaches
Association Rule Mining (ARM) is one of the most popular data mining technique. All existing work is based on frequent itemset. Frequent itemset find application in number of real-life contexts e.g., market basket analysis, medical image processing, biological data analysis. In recent years, the attention of researchers has been focused on infrequent itemset mining. This paper tackles the issue...
متن کاملMining Association Rules from Unstructured Documents
This paper presents a system for discovering association rules from collections of unstructured documents called EART (Extract Association Rules from Text). The EART system treats texts only not images or figures. EART discovers association rules amongst keywords labeling the collection of textual documents. The main characteristic of EART is that the system integrates XML technology (to transf...
متن کامل